Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR

نویسندگان

  • Masatsune Tamura
  • Takashi Masuko
  • Keiichi Tokuda
  • Takao Kobayashi
چکیده

This paper describes a technique for synthesizing speech with an arbitrary speaker characteristics using speaker independent speech units, which we call “average voice” units. The technique is based on an HMM-based text-to-speech (TTS) system and MLLR adaptation algorithm. In the HMM-based TTS system, speech synthesis units are modeled by multi-space probability distribution (MSD) HMMs which can model spectrum and pitch simultaneously in a unified framework. We derive an extension of the MLLR algorithm to apply it to MSD-HMMs. We demonstrate that a few sentences uttered by a target speaker are sufficient to adapt not only voice characteristics but also prosodic features. Synthetic speech generated from adapted models using only four sentences is very close to that from speaker dependent models trained using 450 sentences.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text-to-speech synthesis with arbitrary speaker's voice from average voice

This paper describes a technique for synthesizing speech with any desired voice. The technique is based on an HMM-based text-to-speech (TTS) system and MLLR adaptation algorithm. To generate speech of an arbitrarily given target speaker, speaker-independent speech units, i.e., average voice models, is adapted to the target speaker using MLLR framework. In addition to spectrum and pitch adaptati...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

MLLR adaptation for hidden semi-Markov model based speech synthesis

This paper describes an extension of maximum likelihood linear regression (MLLR) to hidden semi-Markov model (HSMM) and presents an adaptation technique of phoneme/state duration for an HMM-based speech synthesis system using HSMMs. The HSMM-based MLLR technique can realize the simultaneous adaptation of output distributions and state duration distributions. We focus on describing mathematical ...

متن کامل

Speaker adaptation for HMM-based speech synthesis system using MLLR

This paper describes a voice characteristics conversion technique for an HMM-based text-to-speech synthesis system. The system uses phoneme HMMs as the speech synthesis units, and voice characteristics conversion is achieved by changing HMM parameters appropriately. To transform the voice characteristics of synthetic speech to the target speaker, we apply an MLLR (Maximum Likelihood Linear Regr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001